Ambiguous (((Par(t)(it))((ion))(s))(in)) Thai Text

نویسنده

  • Doug Cooper
چکیده

Despite the importance of segmentation to a variety of software applications, almost nothing is known about the characteristics or distribution of ambiguous partitions (eg. to Pend vs. top_end) in Thai text. By using special-purpose code to investigate a large (-400K word) text corpus, we were able to extract 36,267 such sequences, involving 9,253 distinct examples. Of these, a little more than two-fifths involved genuinely ambiguous partitions. We classify partitioning problems into distinct categories, report on many of their statistical and lexical characteristics, and describe heuristics for choosing the correct partition that do not depend on the availability of a large segmented corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Customizing A Lexicon To Better Suit A Computational Task

We discuss a method for augmenting and rearranging a structured lexicon in order to make it more suitable for a topic labefing task, by making use of lexical association information from a large text corpus. We first describe an algorithm for converting the hierarchical structure of WordNet [13] into a set of flat categories. We then use lexical cooccurrence statistics in combination with these...

متن کامل

Contextual Effects and Locality Preferences in Relative Clause Attachment in Thai

Since the early 1990s, there has been a debate on the universality of locality in sentence processing (i.e., the preference to associate a word or phrase to the closest possible word). Studies across various languages have investigated ambiguous relative clauses that can be attached to either of two nouns to determine the types of languages in which locality is violated. We report a corpus coun...

متن کامل

The intonational Structuring of Discourse

We propose a mapping between prosodic phenomena and semantico-pragmatic effects based upon the hypothesis tha t intonation conveys information about the intent ional as well as the attentional s t ructure of discourse. In part icular, we discuss how variations in pitch range and choice of accent and tune can help to convey such information as: discourse segmentat ion and topic structure, approp...

متن کامل

Modeling Rhythmic Variation in Thai and its Application to Speech Synthesis

This study concerns a preliminary experiment on modeling the duration of Thai syllables. It is based on a corpus of minimal pairs of sentences only differing as to their stress patterns. Following a factor analysis of syllabic durations in the corpus a simple duration model was developed. This model was used for re-synthesizing the utterances by manipulating speech from a Thai TTS system by adj...

متن کامل

بررسی عوامل سازنده ابهام در مقالات شمس با تأکید بر مسئله انسجام دستوری

Maqālāt Shams as an eminent and significant work in the history of mysticism and Persian literature, which has a close relation with Rumi’s life and work, not only ignord by common readers but also research scholars have not paid attention to it properly. One of the main reasons of this, seems to be scattered sentences and lack of apparent firmness of the text, which have caused it appears a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996